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Abstract 

Multi-agent games are becoming an increas- 
ingly prevalent formalism for the study of 
electronic commerce and auctions. The speed 
at which transactions can take place and 
the growing complexity of electronic market- 
places makes the study of computationally 
simple agents an appealing direction. In this 
work, we analyze the behavior of agents that 
incrementally adapt their strategy through 
gradient ascent on expected payoff, in the 
simple setting of two-player, two-action, iter- 
ated general-sum games, and present a sur- 
prising result. We show that either the agents 
will converge to a Nash equilibrium, or if the 
strategies themselves do not converge, then 
their average payoffs will nevertheless con¬ 
verge to the payoffs of a Nash equilibrium. 

1 Introduction 

It is widely expected that in the near future, Software 
agents will act on behalf of humans in many electronic 
marketplaces based on auctions, barter, and other 
forms of trading. This makes multi-agent game the- 
ory (Owen, 1995) increasingly relevant to the emerg- 
ing electronic economy. There are many different for¬ 
malisms within game theory that model interaction 
between competing agents. Our interest is in iterated 
games, which model situations where a set of agents 
or players repeatedly interact with each other in the 
same game. There is a long and illustrious history 
of research in iterated games, such as the study of 
(and even competitions in) solving iterated prisoner’s 
dilemma (Owen, 1995). Of particular interest in iter¬ 
ated games is the possibility of the players adaptmg 
their strategy based on the history of interaction with 
the other players. 

Many different algorithms for adaptive play in iterated 


games have been proposed and analyzed. For example, 
in fictitious play, each player maintains a model of the 
mixed strategy of the other players based on the em- 
pirical play so far, and always plays the best response 
to this model at each iteration (Owen, 1995). While 
it is known that the time averages of the strategies 
played form a Nash equilibrium, the strategies them¬ 
selves do not converge to Nash, nor are the averaged 
payoffs to the players guaranteed to be Nash. Kalai 
and Lehrer (1993) proposed a Bayesian strategy for 
players in a repeated game that requires the players 
to have “informed priors”, and showed that under this 
condition play converges to a Nash equilibrium. A se¬ 
ries of recent results has shown that the informed prior 
condition is actually quite restrictive, limiting the ap- 
plicability of this result. 

These seminal results implicitly assume that un- 
bounded computation is allowed at each step. In con- 
trast, we envision a future in which agents may main- 
tain complex parametric representations of either their 
own strategy or their opponents, and in which full 
Bayesian updating or computation of best responses 
is computationally intractable. In other words, as in 
the rest of artihcial intelligence and machine learning, 
in order to efficiently act in a complex environment, 
agents will adopt both representational and computa- 
tional restrictions on their behavior in iterated games 
and other settings (e.g., Papadimitrou and Yannakakis 
(1994) and Freund et al. (1995)). 

Perhaps the most common type of algorithm within 
machine learning are those that proceed by gradient 
ascent or descent (or other local methods) on some ap- 
propriate objective function. In this paper we study 
the behavior of players adapting by gradient ascent 
in expected payoff in two-person, two-action, general- 
sum iterated games. Thus, here we study a specihc and 
very simple adaptive strategy in a setting in which a 
general mixed strategy is easy to represent. Such a 
study is a prerequisite to an understanding of gradi¬ 
ent methods on rich, parametric strategy representa- 




tions. While it is known from the game theory litera- 
ture that the strategies computed by gradient ascent 
in two-person iterated games need not converge, we 
present a new and perhaps surprising result here. We 
prove that although the strategies of the two players 
may not always converge, their average payoffs always 
do converge to the expected payoffs of some Nash equi- 
librium. Thus, the dynamics of gradient ascent ensure 
that the average payoffs to two players adopting this 
simple strategy is the same as the payoff they would 
achieve by adopting arbitrarily complex strategies. 

In the remaining sections, we dehne our problem and 
the gradient ascent algorithm, and show that the be- 
havior of players adapting via gradient ascent can be 
modeled as an afhne dynamical system. Many prop- 
erties of this dynamical system are known from con- 
trol theory literature, and have been applied before to 
the somewhat different setting of evolutionary game 
theory (Weibull, 1997). Our main technical contribu- 
tion is a new and detailed geometric analysis of these 
dynamics in the setting of classical game theory, and 
particularly of the effects of the boundary conditions 
imposed by game theory on those dynamics (in con- 
trast, evolutionary game theory explicitly and artih- 
cially prevents the dynamics from reaching the bound- 
aries). 

2 Problem Definition and Notation 

A two-player, two-action, general-sum game is dehned 
by a pair of matrices 


R = 

rn 

ri2 

and C = 

c 11 

Cl2 


.1*21 

r 22. 


_C21 

C22 . 


specifying the payoffs for the row player (player 1) and 
the column player (player 2), respectively. If the row 
player chooses action i £ {1,2} and the column player 
chooses action j £ {1,2} the payoff to the row player 
is rij and the payoff to the column player is Cij. Two 
cases of special interest are that of zero-sum games, in 
which the payoff of the column player and the payoff 
of the row player always sums to zero (r 8 -j + Cij = 0 
for i,j £ {1,2}), and that of team games, in which 
both players always get the same payoff (rij = Cij for 

ij e { 1 , 2 }). 

The players can choose actions stochastically, in which 
case they are said to be following a mixed strategy. Let 
0 < a < 1 denote the probability of the row player 
picking action 1 and let 0 < [3 < 1 denote the prob¬ 
ability of the column player picking action 1. Then 
V r (a,f3), the value or expected payoff of the strategy 
pair (a, fa to the row player, is 

V r (a, fa = rii(afd) + r 22 ((l — a)(l — fa) 

+ri 2 (a(l -/?)) + r 2 i{{! - a)fa (1) 


and V c (a,fa, the value of the strategy pair (a, fa to 
the column player, is 

V c (a,fa = cu(a/3) + c 22 ((l - a)(l -/?)) 

+ci 2 (a(l — fa) + c 2 i((l — a)fa. (2) 

The strategy pair (a, fa is said to be a Nash equilib- 
rium (or Nash pair) if (i) for any mixed strategy a ', 
V r (a',fa < V r (a,fa, and (ii) for any mixed strategy 
/T, V c (a,fa) < V c (a,fa. In other words, as long as 
one player plays their half of the Nash pair, the other 
player has no incentive to change their half of the Nash 
pair. It is well-known that every game has at least one 
Nash pair in the space of mixed (but not necessarily 
pure) strategies. 


3 Gradient Ascent for Iterated Games 


One can view the strategy pair (a, fa as a point in M 2 
constrained to lie in the unit square. The functions 
V r (a,fa and V c (a,fa then dehne two value surfaces 
over the unit square for the row and column players 
respectively. For any given strategy pair, (a,/?), one 
can compute a gradient for the row player from the 
V r -value surface and for the column player from the 
14-value surface as follows. Letting u = (rn + r 22 ) — 
(r 2 1 + r 12 ) and let u' = (cu + c 22 ) - (c 2 i + c i2 )), we 
have 

Oigä.fr-te-r,,) (3) 

= au' - . ■ - c,,). (4) 

In the gradient ascent algorithm, each player repeat- 
edly adjusts their half of the current strategy pair in 
the direction of their current gradient with some step 
size rj: 


0!k +1 

fa +1 


ctk + rj 
fa +V 


dV r (a k ,fa) 

da 

dV c (a k ,fa) 

<9/3 


( 5 ) 


where (ao, fa) is an arbitrary starting strategy pair. 
Points on the boundary of the unit square (where at 
least one of a and [3 is zero or one) have to be handled 
in a special manner, because the gradient may lead the 
players to an infeasible point outside the unit square. 
Therefore, for points on the boundary for which the 
gradient points outside the unit square, we redefme the 
gradient to be the projection of the true gradient onto 
the boundary. For ease of exposition, we do not change 
the notation in Equation 5 to reflect the projection of 
the gradient at the boundary, but the behavior there 
should be understood and is important to our analysis. 


Note that the gradient ascent algorithm assumes a full 
information game — that is, both players know both 



game matrices, and can see t.he mixed strategy of their 
opponent at t.he previous st.ep. (However, if only t.he 
a.ct.ua.l previous rnove pla.yed is visible, we ca.n dehne a. 
st.ocha.st.ic gra.dient. a.scent. a.lgorit.hm.) 


4 Gradient Ascent as Affine 
Dynamical System 


If t.he row and colurnn pla.yers were to pla.y a.ccording 
to t.he gra.dient. a.scent. a.lgorit.hm of Equa.t.ion 5, t.hey 
would at iteration k pla.y t.he strategy pair (cv *,,/?*,), 
and receive expect.ed pa.yoffs V r (ctk, fik) and V c (oik, fik) 
respect.ively. We are int.erest.ed in t.he performa.nce of 
t.he t.wo pla.yers o ver t. inre. In pa.rt.icula.r, we are int.er¬ 
est.ed in wha.t. ha.ppens to t.he strategy pa.ir and pa.yoff 
sequences over t.inre. It. is well-known in ga.nre t.heory 
t.ha.t. t.he strategy pair sequence produced by following a. 
gra.dient. a.scent. a.lgorit.hm nra.y never converge (Owen, 
1995). In t.his pa.per we prove t.ha.t. t.he average payoff 
of bot.h pla.yers a.lwa.ys converges t.o t.ha.t. of sonre Nash 
pa.ir, rega.rdless of whet.her t.he strategy pa.ir sequence 
it.self converges or not.. Not.e t.ha.t. t.his a.lso nrea.ns t.ha.t. 
if t.he strategy pa.ir sequence does converge, it. nrust. 
converge t.o a. Nash pa.ir. 

For t.he purposes of analysis, it. is convenient. t.o hrst. 
consider t.he gra.dient. a.scent. a.lgorit.hm for t.he linrit.ing 
ca.se of inhnit.esinra.l st.ep size (lmr^o); herea.ft.er we 
will refer t.o t.his a.s t.he IGA (for Inhnit.esinra.l Gra.dient. 
Ascent.) a.lgorit.hm. Subsequent.ly we will show t.ha.t. t.he 
a.synrpt.ot.ic convergence propert.ies of IGA a.lso hold 
in t.he nrore pra.ct.ica.l ca.se of gra.dient. a.scent. wit.h de- 
crea.sing furit.e st.ep size. In IGA, t.he sequence of strat¬ 
egy pairs beconres a. cont.inuous trajectory in t.he unit. 
squa.re (t.hough t.here are discont.inuit.ies at t.he bound- 
a.ries of t.he unit. squa.re beca.use of t.he project.ed gra.¬ 
dient.). The ba.sic intuition behind our analysis conres 
fronr viewing t.he t.wo pla.yers behaving a.ccording t.o 
IGA a.s a. dyna.nrica.l systern in ffi 2 . In pa.rt.icula.r, a.s 
we show below t.he dyna.nrics of t.he strategy pa.ir t.ra.¬ 
ject.ory is t.ha.t. of an a.ffine dyna.nrica.l systern. This 
view does not. t.a.ke int.o account. t.he const.ra.int. t.ha.t. t.he 
strategy pa.ir ha.s t.o lie in t.he unit. squa.re. This sep¬ 
aration bet.ween t.he unconst.ra.ined dyna.nrics and t.he 
const.ra.int.s of t.he unit. squa.re will be useful t.hroughout. 
t.he rest. of t.his pa.per. 

Using Equa.t.ions 3,4 and 5 and an inHnit.esinra.l st.ep 
size, it. is ea.sy t.o show t.ha.t. t.he u.nconstra.ined. dyna.nr¬ 
ics of t.he strategy pa.ir as a. funct.ion of t.inre is dehned 
by t.he following differential equa.t.ion: 





~(r 2 2 ~ ri 2 ) 
_—(c 22 — c 2 1 ) _ 


( 6 ) 


We denot.e t.he off-dia.gona.l nra.t.rix cont.a.ining t.he 
t.ernrs u and u' in Equa.t.ion 6 a.s U. 


a) b) 



Figure 1: The general form of t.he dynamics: a) when U 
has imaginary eigenvalues and b) when U has real eigen- 
values. 

Fronr dyna.nrica.l systems t.heory (Reinhard, 1987), it. 
is known t.ha.t. if t.he nra.t.rix U is invert.ible (we ha.ndle 
t.he non-invert.ible ca.se sepa.ra.t.ely below), t.he uncon¬ 
st.ra.ined strategy pa.ir t.ra.ject.oriöÉ ca.n only t.a.ke t.he 
t.wo possible qua.lit.a.t.ive fornrs shown in Figure 1. No- 
t.ice t.ha.t. t.hese t.wo dyna.nrics are very different.: t.he one 
in Figure la. ha.s a. linrit.-cycle behavior, while t.he one in 
Figure lb is divergent.. Now depending on t.he exa.ct. 
va.lues of u and t.he ellipses in Figure la. ca.n be- 
conre na.rrower or wider, or even reverse t.he direct.ion 
of t.he flow. Sinrila.rly, t.he a.ngle bet.ween t.he da.shed 
a.xes in Figure lb and t.he direct.ion of flow a.long t.he 
a.xes will depend on u and u'. But. t.hese are t.he t.wo 
general fornrs of unconst.ra.ined dyna.nrics t.ha.t. are pos¬ 
sible. In t.he next. sect.ion we dehne t.he cha.ra.ct.erist.ics 
of genera.l-sunr ga.nre t.ha.t. det.ernrine whet.her t.he un¬ 
const.ra.ined dyna.nrics is ellipt.ica.l or divergent.. 

The center where t.he a.xes of t.he ellipses nreet., or 
where t.he da.shed-a.xes of t.he divergent. dyna.nrics nreet., 
is t.he point. at. which t.he t.rue gra.dient. is zero. By set- 
t.ing t.he left. lra.nd side of Equa.t.ion 6 t.o zero and solving 
for t.he unique cent.er (cv*,/?*), we get.: 

(c 22 - c 2 i) (r 22 - ? , i 2 ) l 
u' ’ u 

Not.e t.ha.t. t.he center is in general not. at (0,0), and it. 
nra.y not. even be inside t.he unit. squa.re. 

5 Analysis of IGA 

The following is our nra.in result.: 

Theorem 1 (Nash convergence of IGA m iterated 
general-sum gam.es) If in a two-person, two-action, it¬ 
erated general-sum game, both pla.yers folloiv the IGA 
algorithm, their average pa.yoffs will converge in the 
limit to the expected. pa.yoffs for sorne Nash equilibrium. 
This will happen in one of two wa.ys: 1) the strategy 
pa.ir trajectory will itself converge to a. Nash pa.ir, or 
2) the strategy pa.ir trajectory will not converge, bu.t 
the average pa.yoffs of the two pla.yers will nevertheless 
converge to the expected. pa.yoffs of sorne Nash pa.ir. 




The proof of this theorem is complex and involves con- 
sideration of several special cases, and we present it in 
some detail below. But first we give some high-level in¬ 
tuition as to why the theorem is correct. First observe 
that if the strategy pair trajectory ever converges, it 
must be that it has reached a point with zero gradi- 
ent (or zero projected gradient if the point is on the 
boundary of the unit square). It turns out that all such 
points must be Nash pairs because no improvement is 
possible for either player. More remarkably, it turns 
out that the average payoff of each ellipse in Figure la 
is exactly the expected payoff of the center (which is a 
point with zero gradient). But how is all this affected 
by the constraints of the unit square? Imagine taking 
a unit square and placing it anywhere in the plane of 
Figure la. The projected gradient along the boundary 
will be determined by which quadrant the boundary 
is in. We show that if there are some ellipses con- 
tained entirely in the unit square, the dynamics will 
converge to one such ellipse, and that if no ellipses 
are contained in the unit square (the center is out- 
side the unit square), then the constrained dynamics 
must converge to a point. In either case, by the ar¬ 
guments above, the average payoff will become Nash. 
Similarly, imagine taking a unit square and placing it 
anywhere on the plane in Figure lb. Given the gra¬ 
dient direction in each quadrant of the plane we show 
that the dynamics will converge to some corner of the 
unit square. Again, the average payoff will become 
Nash. 

From dynamical systems (Reinhard, 1987) it can be 
shown that we only need to consider three mutually 
exclusive and exhaustive cases to complete a proof: 

1. U is not invertible. This will happen whenever u 
or u' or both are zero. Such a case can occur in 
team, zero-sum, and general-sum games. Exam- 
ples of the dynamics in such a case are shown in 
Figure 2. 

2. U is invertible and its eigenvalues are purely imag- 
inary. We can compute the eigenvalues by solving 
for A in the following equation: 

0 u 

y 0 

yielding A 2 = uu'. Therefore we will get imagi- 
nary eigenvalues whenever uu' < 0. Such a case 
can occur in zero-sum and general-sum games but 
cannot happen in team games (because u = u! 
and therefore uu' > 0). Two examples of the dy¬ 
namics are shown in Figure 4. 

3. U is invertible and its eigenvalues are purely real. 
This will happen whenever uu' > 0. Such a 


case can occur in team and general-sum games 
but cannot happen in zero-sum games (because 
u = —u' and therefore uu' < 0). Example dy¬ 
namics are shown in Figure 6. 

Theorem 1 is proved below by showing that Nash con- 
vergence holds in all three cases summarized above. 
But before we analyze these three cases in sequence 
in the next three subsections, we present a basic re- 
sult common to all three cases that shows that if the 
(a(t), /3(t)) trajectory ever converges to a point, then 
that point must be a Nash pair. 

Lemma 2 (Convergence of strategy pair implies con- 
vergence to Nash equilibrium) If, in following IG A, 
lim t _>. 00 (a(t), /3(t)) = (a c ,f3 c ), then (a c , f3 c ) is a Nash 
pair. In other words, if the strategy pair trajectory 
converges at all, it converges to a Nash pair. 

Proof: The strategy pair trajectory converges if and 
only if it reaches a point where the projected gradient 
is exactly zero. This can happen in two ways: 1) the 
point is the center (a*,/?*), where by definition the 
gradient is zero (this can only happen if the center is 
in the unit square), or 2) the point is on the bound¬ 
ary and the projected gradient is zero. Either way, it 
means that from that point no local improvement is 
possible. For a proof by contradiction, assume that 
such a point is not a Nash pair. Then for at least 
one of the players, say the column player, there must 
be a unilateral change that increases their payoff. Let 
the improved point be (a c ,/?j). Then for all e > 0, 
(a c ,(l — e)f3 c + efii) must also be an improvement. 
This follows from the linear dependence of V c (a,/3) 
on [3 and the fact that the unit square is a convex re¬ 
gion. Therefore the projected gradient at a c ,f3 c must 
be non-zero. □ 

Corollary 3 If the center ( a*, [3 *) is m the unit 
square it is a Nash pair. 

5.1 U is not Invertible 

Lemma 4 (Nash convergence when U is not invert¬ 
ible) When the matrix U is not invertible, the IG A 
algorithm leads the strategy pair trajectory to converge 
to a point on the boundary that is a Nash pair. 

Proof: First consider the case when exactly one of u 
and u' is zero. Without loss of generality assume that 
u = 0, i.e., (rn + ^ 2 ) = (^ 21 + 7 * 12 ). Then the gradient 
for the row player is constant (see Equation 3) and de- 
pending on its sign, the row player will converge to ei¬ 
ther a = 0 or to a = 1. Once the row player’s strategy 
has converged, the gradient of the column player will 
also become constant (see Equation 4) and therefore 


= A 



it too will converge to an extreme value and therefore 
t.he joint. strategy will converge to sorne corner. If both 
u and u' are zero, then both t.he gradients are constant 
and again we get convergence to a corner of t.he unit. 
square. 

In summary, if U is not. invert.ible t.he gra.dient. a.lgo- 
rit.hm will lea.d to convergence to sorne point. on t.he 
bounda.ry of t.he unit. squa.re, and hence from Lemrna. 2 
will lea.d to convergence to a. Nash pa.ir of t.he ga.me. 
□ 

a) b) 




Figure 2: Example dynamics wit.h U not invert.ible. a) In 
t.his case u is zero and u is not. b) Both u and u are zero. 

Figure 2 shows t.he dynamics for t.wo genera.l-sum 
ga.mes in which U is not. invert.ible. Figure 2a. is for 
a. ca.se where u = 0 and u' > 0. The gra.dient. for 
t.he row pla.yer is constant. and point.s downwa.rds. The 
gra.dient. for t.he column pla.yer depends on o-, but. once 
o- converges to zero it. point. to t.he right. and therefore 
from all st.a.rt.ing point.s we get. convergence to t.he bot¬ 
tom right. corner. Figure 2b is for a. ca.se where both 
u and u 1 are zero. In t.his ca.se both t.he gradients are 
constant. and we get. piecewise st.ra.ight. line dynamics. 

5.2 U has Purely Imaginary Eigenvalues 

Purely imaginary eigenvalues occur when uu' < 0 
in which ca.se t.he t.wo eigenvalues are y/|u| |u'|* and 
— y/|w||w'|*. It. ca.n be shown t.hat. in such a. ca.se t.he 
unconst.ra.ined dynamics are elliptical a.round a.xes de- 
t.ermined by t.he eigenvect.ors of U (Reinhard, 1987). 
See Figure 3a.) for a.n illustration. There are t.wo pos- 
sible ca.ses to consider: 1) u > 0 and u 1 < 0, and 2) 
u < 0 and u' > 0. However, wit.hout. loss of general- 
it.y we ca.n consider only one ca.sg, When u < 0 and 
u' > 0, 


■ 0 ■ 


T 

/Kl 

- V M - 

+ 

_ 0 _ 


is a. complex eigenvect.or corresponding to t.he eigen- 
va.lue \J | u 11 u.' 11, and 



is a. complex eigenvect.or corresponding to t.he eigen- 
va.lue — y/|w| |w'|*. The a.xes of t.he ellipses in Figure 3a. 


are determined by t.he rea.l and imaginary pa.rt.s of t.he 
t.wo eigenvect.ors, t.hat. is, by t.he vect.ors 


■ 0 ■ 


'l' 

/M 

- V M - 

and 

_ 0 _ 


Not.e t.ha.t. t.hese t.wo vect.ors, and hence t.he a.xes of 
t.he ellipses, are a.lwa.ys ort.hogona.l to ea.ch ot.her and 
pa.ra.llel to t.he a.xes of t.he unit. squa.re. In t.he zero- 
sum ca.se, beca.use |u| = \u '|, t.hey are a.lso equa.l in 
size which mea.ns t.ha.t. t.he dynamics in t.he zero-sum 
ca.se are circula.r (we rnerely observe t.his but. do not. 
use it. herea.ft.er). Not.e t.ha.t. t.he ellipses are centered 
a.t. (a*,/3*) and t.ha.t. t.he unit. squa.re rna.y be a.nywhere 
in ffi 2 and therefore t.he center ca.n be out.side t.he unit. 
squa.re. 

a) b) 



Figure 3: a) Unconst.rained dynamics when U has imagi¬ 
nary eigenvalues. b) The const.ramed dynamics when t.he 
center is in t.he unit. square. In each case only sorne sample 
t.raject.ories are shown. 

We ca.n solve t.he a.ffine differential Equa.t.ion 6 for t.he 
unconst.ra.ined dynamics of o- and [3 to get.: 

a (t) = B\fTicos(\/uu , t + <fi) + a* (8) 

and 

/3(t) = B\fu/ sm(V uu.'t + <f>) + [3* (9) 

where B and <ft are const.a.nt.s dependent. on t.he ini¬ 
tial o- and (3. These are t.he equa.t.ions for t.he ellipses 
of Figure 3a.. Not.e; t.ha.t. if a.n ellipse ha.ppens to lie 
entirely inside t.he unit. squa.re then t.hes6 equa.t.ions 
a.lso describe t.he const.ra.ined dynamics for a.ny st.a.rt.¬ 
ing strategy pa.ir t.ha.t. falls on t.ha.t. ellipse. 

Lemma 5 (Nash A verage Payoff for Ellipse entirely 
inside unit square) For a.ny initial strategy pair 
(ao,/3o), if the trajectory given by Equations 8 and 9 
hes entirely within the unit square, then the average 
payoffs along that trajectory are exactly the expected 
payoffs of a Nash pair. 

Proof: Under t.he a.ssumpt.ion t.ha.t. t.he ellipse lies en- 
t.irely in t.he unit. squa.re, t.he average pa.yoff for t.he 
row pla.yer ca.n be comput.ed by int.egra.t.ing t.he va.lue 




obtained by the row-player in Equation 1 where the 
a and (3 trajectories are those specified in equations 8 
and 9. It can be shown that the integral of just the 
cosine term, just the sine term, and the product of 
the cosine and sine terms is exactly zero. This leaves 
just the terms containing a* and /?*. Therefore, the 
average payoffs are exactly the expected payoffs of the 
center which by Corollary 3 is a Nash pair. □ 

Therefore when the center point of the ellipses is in 
the interiör of the unit square, then all ellipses around 
it that lie entirely within the unit square have Nash 
payoffs. 

Finally, we are ready to prove that when U has imagi- 
nary eigenvalues, the average payoffs of the two players 
are always that of some Nash pair. 

Lemma 6 (Nash Convergence m the case of imagi- 
nary eigenvalues) When the matrix U has imaginary 
eigenvalues, the IGA algorithm either leads the strat- 
egy pairs to converge to a point on the boundary that is 
a Nash pair, or else the strategy pairs do not converge, 
but the average payoff of each player converges m the 
limit to that of some Nash pair. 

Proof: Consider again the unconstrained dynamics 
of Figure 3a. The four quadrants have the following 
general properties: in quadrant A the gradient has a 
positive component in the down and right directions, 
in quadrant B the gradient has a positive component 
in the down and left directions, in quadrant C the 
gradient has a positive component in the up and left 
directions, and in quadrant D the gradient has a posi¬ 
tive component in the up and right directions. The di- 
rection of the gradient on the boundaries between the 
quadrants is also shown in Figure 3a. The important 
observation here is that the direction of the gradient 
in each quadrant is such that there is a clockwise cycle 
through the quadrants. 

There are three possible cases to consider depending 
on the location of the center (a*,/?*). 

1. Center is in the interiör of the unit square. 

First observe that all boundaries are tangent to 
some ellipse, and that at least one boundary is 
tangent to an ellipse that lies entirely within the 
unit square. For example, in Figure 3b the tan¬ 
gent ellipse to the left-side boundary lies wholly 
inside the unit square, while the other three 
boundarys’ tangent ellipses are not contained in 
the unit square. 

If the initial strategy pair coincides with the cen¬ 
ter, we will get immediate convergence to a Nash 
equilibrium because the gradient there is zero. If 
the initial strategy pair is off the center point, 


then one of two things can happen: 1) either the 
ellipse that passes through the initial point does 
not intersect with the boundary, or 2) the ellipse 
that passes through the initial point intersects 
with a boundary. In the Hrst case the dynam¬ 
ics will just follow the ellipse, and by Lemma 5 
above, the average payoff for both players will be 
Nash, even though the strategy pairs themselves 
will not converge, but will follow the ellipse for- 
ever. In Figure 3b this will happen if the initial 
strategy pair is inside or on the dashed ellipse. 
In case 2) above, the strategy pair trajectory will 
hit a boundary, and then travel along it until it 
reaches a point at which the boundary is tangent 
to some ellipse that may or may not lie entirely 
in the unit square. If it does, then the trajec¬ 
tory will follow that ellipse thereafter. If it does 
not, then the trajectory will follow the tangent 
ellipse to the next boundary in the clockwise di¬ 
rection. This process will repeat until the bound¬ 
ary that has a tangent ellipse lying entirely within 
the unit square is reached. In Figure 3b, if the 
initial strategy pair starts anywhere outside the 
dashed ellipse, the dynamics will eventually follow 
the dashed ellipse. In all cases, from Lemma 5 we 
will get asymptotic convergence to the expected 
payoffs of some Nash pair. 

2. Center is on the Boundary. Consider the case 
where the center is on the left-side boundary of 
the unit square. The Hrst observation is that all 
points below the center on the left-side boundary 
will then have a projected gradient of zero. (Fig¬ 
ure 3a shows that the gradient at such points will 
point left, and therefore outside the unit square 
and perpendicular to the left-side boundary.) The 
bottom boundary will have a projected gradient 
to the left. No matter where we start, we will ei¬ 
ther hit the bottom boundary, in which case we 
will get convergence to the lower left corner of 
the unit square, or we will hit the left bound¬ 
ary below the center, in which case again we will 
converge because of the zero projected gradient 
there. In either case, from Lemma 2 such points 
will be Nash pairs. By symmetry, a similar argu¬ 
ment is easily constructed when the center is on 
some other boundary of the unit square. 

3. Center outside unit square. There are two 
cases to consider: 1) the unit square lies en¬ 
tirely inside one quadrant, and 2) the unit square 
lies entirely inside two adjacent quadrants. No 
other case is possible. First consider the situation 
when the unit square lies entirely within quad¬ 
rant A. Then the gradient at each point points 
down and right (see Figure 3a), and hence we 



will get convergence to t.he bottom right corner 
of t.he unit. squa.re. A similar argument is ea.s- 
ily constructed when t.he unit. squa.re lies entirely 
wit.hin sorne ot.her quadrant. (yielding convergence 
to sorne ot.her corner of t.he unit. squa.re). 

Next. consider t.he ca.se where t.he unit. squa.re lies 
in quadrants A and D. In quadrant. D t.he gra- 
dient. point.s right. and up, so eit.her t.he t.ra.jec¬ 
t.ory will ent.er quadrant. A wit.hout. hitting t.he 
t.op bounda.ry, or it. will hit. t.he t.op bounda.ry, 
in which ca.se t.he project.ed gradient will be t.o- 
wa.rds t.he right., and a.ga.in it. will ent.er quadrant. 
A. In quadrant. A we will converge to t.he lower 
right. corner of t.he unit. squa.re as a.bove. A simi¬ 
lar argument, is ea.sily constructed when t.he unit. 
squa.re lies wit.hin sorne ot.her t.wo a.dja.cent. qua.d- 
ra.nt.s (a.ga.in yielding convergence to sorne corner 
of t.he unit. squa.re). 

□ 

a) b) 


Figure 4: Example dynamics when U has imaginary eigen- 
values. a) The center is in t.he unit square. b) The center 
is on t.he left. boundary of t.he unit. square. 

In Figure 4 we present, exa.mples of st.ra.t.egy pa.ir t.ra- 
ject.ories for example problems whose U matrices have 
imaginary eigenva.lues. The left-hand hgure is for a. 
ca.se where t.he center is contained in t.he unit. squa.re 
while ihe right-hand hgure is for a. ca.se where t.he cen¬ 
ter is on t.he left-hand bounda.ry of t.he unit. squa.re. 


5.3 U has Real Eigenvalues 



Figure 5: a) General charact.erist.ics of t.he unconst.rained 
dynamics when U has real eigenvalues. b) The possible 
t.ransit.ions bet.ween t.he quadrants of t.he left-hand figure. 

The unconst.rained dynamics of a. linea.r differential 
system wit.h rea.l eigenva.lues are known to be diver- 


gent. (Reinhard, 1987). See Figure lb) for a.n illus¬ 
tration. Thus, wit.hout. t.he const.ra.ints of t.he unit. 
squa.re, t.he st.ra.t.egy pa.ir t.ra.ject.ory would divergÄ 
Figure 5a. shows t.he crucia.l general propert.ies of t.he 
unconst.rained dynamics. The center (a*,/3*) is t.he 
point. where t.he gradient. is zero. Everywhere in qua.d- 
ra.nt. A t.he gradient. ha.s a. positive component in t.he 
right. direction and in t.he up direction; in quadrant. 
B t.he gradient. ha.s a. positive component. in t.he up 
direction and in t.he left. direction; in quadrant. C t.he 
gradient. ha.s a. positive component. in t.he left. and down 
directions; and in quadrant. D t.he gradient. ha.s a. pos¬ 
itive component. in t.he right. and down directions. At. 
t.he bounda.ry bet.ween quadrants A and D t.he gradi¬ 
ent point.s left.; at t.he bounda.ry bet.ween A and B it. 
point.s up; at t.he bounda.ry bet.ween B and C it. point.s 
to t.he right.; and at t.he bounda.ry bet.ween C and D is 
point.s down. The unit. squa.re t.hat. dehnes t.he fea.sible 
ra.nge of st.ra.t.egy pairs ca.n be a.nywhere rela.t.ive to t.he 
center. The eigenvect.ors corresponding to t.he t.wo rea.l 
eigenva.lues, \/uu' and —s/uu' are 



respect.ively (t.his is for t.he ca.se t.hat u,u' > 0; t.he 
analysis for t.he ca.se t.hat. u,u' < 0 is a.na.logous and 
omit.t.ed). The eigenvect.ors are represent.ed in Fig¬ 
ure 5a. wit.h da.shed lines: one by dra.wing a. line 

through t.he center and t.he point. (1, \J~^) and and 
t.he ot.her by dra.wing a. line through t.he center and 
t.he point. (1,— Note: t.ha.t. t.he general qua.lit.a- 

t.ive cha.ra.ct.erist.ics of t.he positive component.s of t.he 
gradient. in t.he different. quadrants do not. depend on 
t.he eigenvect.ors. However, t.he eigenvect.ors are rele¬ 
vant. to t.he det.a.iled dynamics, as we will see in t.he 
exa.mples below. 

Lemma 7 (Nash convergence in the case of real 
eigen values) For the case of U having real eigenvalues, 
the IGA algorithm leads the strategy pair trajectory to 
converge to a point on the boundary that is a Nash 
pair. 

Proof: Consider t.he gra.ph of possible t.ransit.ions be¬ 
t.ween t.he quadrants in Figure 5b. From every point. 
inside quadrant. A, t.he gradient. is such t.ha.t. t.he.st.ra.t.¬ 
egy pa.ir will never lea.vet.ha.t. quadrant.. Therefore if 
t.he st.ra.t.egy pa.ir t.ra.ject.ory ever ent.ers quadrant. A, it. 
will converge to t.he t.op right. corner of t.he unit. squa.re. 
Similarly, from every point. inside quadrant. C , t.he gra¬ 
dient is such t.ha.t. t.he st.ra.t.egy pa.ir will never leave t.ha.t. 
quadrant.. Thus if t.he st.ra.t.egy pa.ir t.ra.ject.ory ever en¬ 
t.ers quadrant. C, it. will converge to t.he lower left. cor¬ 
ner of t.he unit. squa.re. If t.he initial st.ra.t.egy pa.ir is in 
quadrant. B or D, t.he dynamics is a. bit. rnore complex 





because it depends on t.he location of t.he unit. square 
relative to t.he center. We consider only t.he ca.se of 
quadrant D, for by symmetry a. similar analysis will 
hold for quadrant. B. Unless t.he unit. square lies en- 
t.irely in quadrant. D, t.hést.ra.t.egy pair t.ra.ject.ory will 
ent.er quadrant. C or quadrant. A, in which ca.se we will 
get convergence to t.he a.ssocia.t.ed corner as above. If 
t.he unit. square is ent.irely wit.hin quadrant. D, t.hen 
t.he direct.ion of t.he gra.dient. in t.ha.t. quadrant. will lea.d 
to convergence to t.he lower right. corner of t.he unit. 
square. Fina.lly, if t.he right.-ha.nd bounda.ry of t.he unit. 
square is on t.he bounda.ry bet.ween qua.dra.nt.s D and 
A, t.hen all t.he point.s on t.ha.t. bounda.ry will ha.ve zero 
project.ed gra.dient., and a.ny t.ra.ject.ory from D hit.ting 
t.ha.t. bounda.ry will converge t.here. Similarly, if t.he 
bottom bounda.ry of t.he unit. square is a.ligned wit.h 
t.he bounda.ry bet.ween qua.dra.nt.s C and D, t.hen a.ny 
t.ra.ject.ory from D hit.ting t.ha.t. bounda.ry will converge 
t.here. From Lemrna. 2, if we ever get. convergence to a. 
strategy pa.ir, it. must. be a. Nash pair. □ 

a) b) 




Figure 6: Example dynamics when U has real eigenval- 
ufeB, a) Center in t.he unit. square. b) Center on t.he right. 
boundary of t.he unit. square. 

In Figure 6 we present, examples of strategy pa.ir t.ra- 
ject.ories for example problems whose U matrices ha.ve 
rea.l eigenva.lues. The left.-ha.nd Hgure ha.s t.he center in 
t.he unit. square; t.he right.-ha.nd Hgure ha.s t.he center 
on t.he right.-ha.nd bounda.ry of t.he unit. square. The 
loca.t.ions of t.he Nash point.s are shown. 

6 Finite Decreasing Step Size 

The a.bove analysis and Theorem 1 ha.ve been a.bout. 
t.he IGA a.lgorit.hm t.ha.t. a.ssumes t.ha.t. bot.h pla.yers 
are following t.he t.rue gra.dient. wit.h infinit.esima.l step 
sizes. In pra.ct.ice, of course, t.he t.wo pla.yers would 
use t.he gra.dient. a.scent. a.lgorit.hm of Equa.t.ion 5 wit.h 
a. decreasing finite st.ep size ??/,, (where k is t.he iteration 
number). 

Theorem 8 The sequence of strategy pairs prod.u.ced. 
by both pla.yers following the gradient a.scent a.lgo- 
rithm of Equation 5 with a decreasing step size (several 
schedules will work, e.g., ??/,, = ) will satisfy one of 

the following two properties: 1) it will converge to a 


Nash pair, or 2) the strategy pa.ir sequence will not 
converge, bu.t the average pa.yoff will converge in the 
limit to tha.t of sorne Nash pa.ir. 

Proof: (Sketch) Here we provide sorne intuition; t.he 
full proof is deferred t.o t.he full pa.per. Consider first. 
t.he ca.ses in which t.he IGA a.lgorit.hm converges t.o a. 
Nash pa.ir. The proofs in such ca.ses exploit.ed only t.he 
direct.ion of t.he gra.dient. in t.he four qua.dra.nt.s a.round 
t.he center. These same proofs will ext.end wit.hout. 
modifica.t.ion t.o t.he ca.se of decreasing st.ep sizes. The 
one ca.se (wit.h ima.gina.ry eigenva.lues) in which t.he 
IGA a.lgorit.hm does not. converge t.o a. point. but. in- 
st.ea.d converges t.o sorne ellipse fully cont.a.ined in t.he 
unit. squa.rftjs rnore complex t.o ha.ndle. The ba.sic in¬ 
tuition is t.ha.t. t.he strategy pairs ca.nnot. get. “t.ra.pped” 
a.ny where, and a.s t.he st.ep size decrea.ses, t.he dynamics 
of gra.dient. a.scent. a.pproa.ches t.he dynamics of IGA. □ 

7 Conclusion 

Algorit.hms based on gra.dient. a.scent. in expect.ed pa.y¬ 
off are natural ca.ndidat.es for a.da.pt.ing st.ra.t.egies in 
repea.t.ed ga.mes. In t.hig work we a.na.lyzed t.he per- 
forma.nce of such algorit.hms in the ba.se ca.se of t.wo- 
person, t.wo-a.ct.ion, it.era.t.ed genera.l-sum ga.mes, and 
showed t.ha.t. even t.hough t.he st.ra.t.egies of t.he t.wo pla.y¬ 
ers rna.y not. converge, t.he a.sympt.ot.ic average pa.yoff 
of t.he t.wo pla.yers a.lwa.ys converges t.o t.he expect.ed 
pa.yoff of sorne Nash equilibrium. Our proof a.lso pro¬ 
vides sorne insight. int.o special cla.sses of ga.mes, such 
a.s zero-sum and t.ea.m ga.mes. 

In t.he fut.ure we will st.udy t.he behavior of gra.dient. 
a.scent. algorit.hms in complex mult.i-a.ct.ion and cont.in- 
uous a.ct.ion ga.mes in which t.he pla.yers use pa.ra.met.er- 
ized representations of st.ra.t.egies. 
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